Metagenome Analysis

Only ancestral, streptomycin and control treatments are included, unless otherwise stated.

Authors

Niina Smolander

Shane Hogle

Libraries and functions

Prior to this analysis, the variants have been filtered using Mutect2’s FilterMutectCalls as well as separately for SNPs and INDELs using the following hard filtering factors: SOR, FS, MQ, MQRankSum, ReadPosRankSum, SNP/INDEL clustering (3/35) and SNP/INDEL proximity.

Code
library(tidyverse)
library(Polychrome)
library(withr)
library(fs)
library(compositions)
library(vegan)
library(qvalue)

source("data/input/utils_parallelism.R")

1) Variants in genes of interest

Checking the prevalence of variants in genes of interest known to be related to the streptomycin resistance in bacteria. Variants only found in rpsL and rsmG (gidB) genes.

Code
variants_strep <- read_rds("data/input/variants_strep.rds") %>%
  # exclude intragenic, intergenic, and synonymous mutations. Also exclude
  # fusion functions because these are weird and also rare. Excluding the
  # modifier category also ensures that we filter out any tRNAs with mutations
  filter(!str_detect(EFFECT, "intergenic|intragenic|synonymous|fusion")) %>% 
  filter(!str_detect(IMPACT, "MODIFIER")) 

eggnog <- read_tsv("data/input/HAMBI_all_eggnog_formatted.tsv.xz")
degentab <- read_rds("data/input/annotations_codon_degeneracy.rds")

variants_strep <- left_join(variants_strep, degentab, by = join_by(GENE == locus_tag)) %>% 
  rename(prokka_annotation = product) %>%
  left_join(.,eggnog, by = join_by(GENE == locus_tag)) %>%
  relocate(Home, Measure, COG_category_long, Preferred_name, Description, prokka_annotation)

rpsl <- variants_strep %>%
  filter(gene == "rpsL")

rsmg <- variants_strep %>%
  filter(gene == "rsmG")


rpsl_table <- rpsl %>%
  select(home_env, measure_env, replicate, HAMBI, VAF, GENE, POS, ALT) %>%
  arrange(desc(measure_env), home_env)

rsmg_table <- rsmg %>%
  select(home_env, measure_env, replicate, HAMBI, VAF, GENE, POS, ALT) %>%
  arrange(desc(measure_env), home_env)

# rpsl %>%
#   ggplot(aes(x = replicate_variant, y = VAF)) +
#   geom_bar(stat = "identity") +
#   facet_wrap(home_env ~ measure_env)
# 
# rsmg %>%
#   ggplot(aes(x = replicate, y = VAF)) +
#   geom_bar(stat = "identity") +
#   facet_wrap(home_env+measure_env ~ HAMBI)

# check <- variants_strep %>%
#   filter(HAMBI == "HAMBI_1977", home_env == "Home: Anc", replicate == "A")

1.1) rpsL gene VAF table

In some cases a mutation in the rpsL gene (see the table below) seems to correlate with the community structure (marked with arrows in the barplot). In the t0 and the measure: BS of home: BS HAMBI_0006 (P. putida) is much more abundant in the replicate A than in the other replicates. Same applies for the replicate F and HAMBI_1972 (A. caviae) in the t0 home:BS. Also Home: B Meas: BS replicate C HAMBI_1972 rpsL is mutated. When transferred from streptomycin to no-strep (Home: BS; Meas: B) HAMBI_0006 (replicate A) VAF is only 0.18, presumably due to the fact that after t0 at the beginning of the experiment small amount of ANC HAMBIs were added to the communities. The mutation in all of these is Lys88Arg, which is the same as in the YSK, except for in the Home: Anc Meas: BS HAMBI_2443 in replicate A, where it is Lys43Arg, but in that the abundance of the H_2443 in the community seems to be very low. In the YSK paper, the streptomycin replicate 6 (= replicate F) community has rpsL mutated A. caviae (H_0006) and our results match that. In the YSK paper, P. putida (HAMBI_0006) is mutated before the week 50 in the steptomycin replicate 1 (= replicate A) whereas in the other replicates, the mutation takes place well after the week 50. This is in line with our results, but need to confirm at what week did our experiment start.

Code
rmarkdown::paged_table(rpsl_table)

HAMBI abundances

1.2) rsmG (gidB) gene VAF table

In the t0 Home: BS HAMBI_1977 (yellow in barplot) rsmG mutated in all of the replicates, but the position and variant varies. Mutations also in replicates B and C of Home: Anc Meas: BS, but not in the replicate A; in Home: B Meas: BS and in Home: BS Meas: BS in all replicates. In the Home: BS Meas: B replicates A and C, the VAF is lower than in the t0 Home: BS. Other HAMBIs where variants have been found seem to be very low in abundance (based on the barplot).

Code
rmarkdown::paged_table(rsmg_table)

HAMBI abundances
Code
# checking rrs gene variants, but none found
# varfiles <- list.files(path = "data/input/tables_better_annotated/")
# varfiles_snpeff <- varfiles[base::grep("snpeff", varfiles)]
# 
# variants <- list()
# for(i in varfiles_snpeff){
#   name <- str_remove(i, "_L003.VF.HF.snpeff.tsv")
#   variants[[name]] <- read_tsv(paste0("data/input/tables_better_annotated/", i),
#                                skip = 1,
#                                col_names = c("CHROM",
#                                              "POS",
#                                              "REF",
#                                              "ALT",
#                                              "FILTER",
#                                              "EFFECT",
#                                              "IMPACT",
#                                              "GENE",
#                                              "BIOTYPE",
#                                              "HGVS_C",
#                                              "HGVS_P",
#                                              "CDS_POS",
#                                              "CDS_LEN",
#                                              "AA_POS",
#                                              "AA_LEN"),
#                                col_types = c("cdcccccccccdddd"))
# }
# 
# 
# variants <- bind_rows(variants, .id = "Sample") %>%
#   filter(FILTER == "PASS")
# 
# varfiles_variants <- varfiles[base::grep("variants", varfiles)]
# 
# variants2 <- list()
# for(i in varfiles_variants){
#   name <- str_remove(i, "_L003.VF.HF.variants.tsv")
#   variants2[[name]] <- read_tsv(paste0("data/input/tables_better_annotated/", i),
#                                 skip = 1,
#                                 col_names = c("CHROM",
#                                               "POS",
#                                               "REF",
#                                               "ALT",
#                                               "FILTER",
#                                               "TYPE",
#                                               "AD",
#                                               "AF",
#                                               "DP",
#                                               "FAD",
#                                               "GQ",
#                                               "VAF"),
#                                 col_types = c("cdcccccccccc"))
# }
# 
# 
# variants2 <- bind_rows(variants2, .id = "Sample") %>%
#   filter(FILTER == "PASS")
# 
# # Combine and format rows with multiple ALTs
# 
# variants_all <- full_join(variants, variants2,
#                           by = join_by(Sample, CHROM, POS, REF, ALT, FILTER)) %>%
#   separate_wider_delim(cols = "Sample", delim = "-", names = c("HAMBI", "Sample"))
# 
# 
# variants_all_tidy <- variants_all %>%
#   separate_wider_delim(cols = "AD", delim = ",", names = c("REF_AD" ,"ALT1_AD", "ALT2_AD", "ALT3_AD", "ALT4_AD"), too_few = "align_start") %>%
#   separate_wider_delim(cols = "FAD", delim = ",", names = c("REF_FAD" ,"ALT1_FAD", "ALT2_FAD", "ALT3_FAD", "ALT4_FAD"), too_few = "align_start") %>%
#   mutate(ALT_AD = paste(ALT1_AD, ALT2_AD, ALT3_AD, ALT4_AD, sep = ","),
#          ALT_FAD = paste(ALT1_FAD, ALT2_FAD, ALT3_FAD, ALT4_FAD, sep = ",")) %>%
#   mutate(ALT_AD = str_remove_all(ALT_AD, ",NA"),
#          ALT_FAD = str_remove_all(ALT_FAD, ",NA")) %>%
#   select(-c("ALT1_AD", "ALT2_AD", "ALT3_AD", "ALT4_AD", "ALT1_FAD", "ALT2_FAD", "ALT3_FAD", "ALT4_FAD", "FILTER", "GQ")) %>%
#   separate_longer_delim(cols = c("ALT", "AF", "VAF", "ALT_AD", "ALT_FAD"), delim = ",") %>%
#   mutate(across(c("AF", "DP", "VAF", "REF_AD", "ALT_AD", "REF_FAD", "ALT_FAD"), as.numeric))
# 
# 
# rrs_genes <- read_tsv("data/input/gene_names_and_products_key.tsv") %>%
#   filter(product == "16S ribosomal RNA")
# 
# variants_all_tidy %>%
#   filter(GENE %in% rrs_genes$ID)